{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Running BERT-Squad model  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**This tutorial shows how to run the BERT-Squad model on Onnxruntime.**\n",
    "\n",
    "To see how the BERT-Squad model was converted from tensorflow to onnx look at [BERTtutorial.ipynb](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 1 - Write the input file that includes the context paragraph and the questions for the model to answer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing inputs.json\n"
     ]
    }
   ],
   "source": [
    "%%writefile inputs.json\n",
    "{\n",
    "  \"version\": \"1.4\",\n",
    "  \"data\": [\n",
    "    {\n",
    "      \"paragraphs\": [\n",
    "        {\n",
    "          \"context\": \"In its early years, the new convention center failed to meet attendance and revenue expectations.[12] By 2002, many Silicon Valley businesses were choosing the much larger Moscone Center in San Francisco over the San Jose Convention Center due to the latter's limited space. A ballot measure to finance an expansion via a hotel tax failed to reach the required two-thirds majority to pass. In June 2005, Team San Jose built the South Hall, a $6.77 million, blue and white tent, adding 80,000 square feet (7,400 m2) of exhibit space\",\n",
    "          \"qas\": [\n",
    "            {\n",
    "              \"question\": \"where is the businesses choosing to go?\",\n",
    "              \"id\": \"1\"\n",
    "            },\n",
    "            {\n",
    "              \"question\": \"how may votes did the ballot measure need?\",\n",
    "              \"id\": \"2\"\n",
    "            },\n",
    "            {\n",
    "              \"question\": \"By what year many Silicon Valley businesses were choosing the Moscone Center?\",\n",
    "              \"id\": \"3\"\n",
    "            }\n",
    "          ]\n",
    "        }\n",
    "      ],\n",
    "      \"title\": \"Conference Center\"\n",
    "    }\n",
    "  ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 2 - Download the uncased file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -q https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n",
    "!unzip uncased_L-12_H-768_A-12.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 3 - Preprocessing "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract parameters from the given input and convert it into features. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: Logging before flag parsing goes to stderr.\n",
      "W0801 15:12:32.171624 139735288653568 deprecation_wrapper.py:119] From /data/home/t-kupill/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"version\": \"1.4\",\n",
      "  \"data\": [\n",
      "    {\n",
      "      \"paragraphs\": [\n",
      "        {\n",
      "          \"context\": \"In its early years, the new convention center failed to meet attendance and revenue expectations.[12] By 2002, many Silicon Valley businesses were choosing the much larger Moscone Center in San Francisco over the San Jose Convention Center due to the latter's limited space. A ballot measure to finance an expansion via a hotel tax failed to reach the required two-thirds majority to pass. In June 2005, Team San Jose built the South Hall, a $6.77 million, blue and white tent, adding 80,000 square feet (7,400 m2) of exhibit space\",\n",
      "          \"qas\": [\n",
      "            {\n",
      "              \"question\": \"where is the businesses choosing to go?\",\n",
      "              \"id\": \"1\"\n",
      "            },\n",
      "            {\n",
      "              \"question\": \"how may votes did the ballot measure need?\",\n",
      "              \"id\": \"2\"\n",
      "            },\n",
      "            {\n",
      "              \"question\": \"By what year many Silicon Valley businesses were choosing the Moscone Center?\",\n",
      "              \"id\": \"3\"\n",
      "            }\n",
      "          ]\n",
      "        }\n",
      "      ],\n",
      "      \"title\": \"Conference Center\"\n",
      "    }\n",
      "  ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import onnxruntime as ort\n",
    "import tokenization\n",
    "import os\n",
    "from run_onnx_squad import *\n",
    "import json\n",
    "\n",
    "input_file = 'inputs.json'\n",
    "with open(input_file) as json_file:  \n",
    "    test_data = json.load(json_file)\n",
    "    print(json.dumps(test_data, indent=2))\n",
    "  \n",
    "# preprocess input\n",
    "predict_file = 'inputs.json'\n",
    "\n",
    "# Use read_squad_examples method from run_onnx_squad to read the input file\n",
    "eval_examples = read_squad_examples(input_file=predict_file)\n",
    "\n",
    "max_seq_length = 256\n",
    "doc_stride = 128\n",
    "max_query_length = 64\n",
    "batch_size = 1\n",
    "n_best_size = 20\n",
    "max_answer_length = 30\n",
    "\n",
    "\n",
    "vocab_file = os.path.join('uncased_L-12_H-768_A-12', 'vocab.txt')\n",
    "tokenizer = tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=True)\n",
    "\n",
    "my_list = []\n",
    "\n",
    "\n",
    "# Use convert_examples_to_features method from run_onnx_squad to get parameters from the input \n",
    "input_ids, input_mask, segment_ids, extra_data = convert_examples_to_features(eval_examples, tokenizer, \n",
    "                                                                              max_seq_length, doc_stride, max_query_length)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 4 - Run the ONNX model under onnxruntime "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create an onnx inference session and run the model "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NodeArg(name='unique_ids_raw_output___9:0', type='tensor(int64)', shape=[None])\n",
      "NodeArg(name='segment_ids:0', type='tensor(int64)', shape=[None, 256])\n",
      "NodeArg(name='input_mask:0', type='tensor(int64)', shape=[None, 256])\n",
      "NodeArg(name='input_ids:0', type='tensor(int64)', shape=[None, 256])\n"
     ]
    }
   ],
   "source": [
    "# run inference\n",
    "\n",
    "# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers\n",
    "# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default\n",
    "# based on the build flags) when instantiating InferenceSession.\n",
    "# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:\n",
    "# ort.InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])\n",
    "session = ort.InferenceSession('bert.onnx')\n",
    "\n",
    "for input_meta in session.get_inputs():\n",
    "    print(input_meta)\n",
    "n = len(input_ids)\n",
    "bs = batch_size\n",
    "all_results = []\n",
    "start = timer()\n",
    "for idx in range(0, n):\n",
    "    item = eval_examples[idx]\n",
    "    # this is using batch_size=1\n",
    "    # feed the input data as int64\n",
    "    data = {\"unique_ids_raw_output___9:0\": np.array([item.qas_id], dtype=np.int64),\n",
    "            \"input_ids:0\": input_ids[idx:idx+bs],\n",
    "            \"input_mask:0\": input_mask[idx:idx+bs],\n",
    "            \"segment_ids:0\": segment_ids[idx:idx+bs]}\n",
    "    result = session.run([\"unique_ids:0\",\"unstack:0\", \"unstack:1\"], data)\n",
    "    in_batch = result[1].shape[0]\n",
    "    start_logits = [float(x) for x in result[1][0].flat]\n",
    "    end_logits = [float(x) for x in result[2][0].flat]\n",
    "    for i in range(0, in_batch):\n",
    "        unique_id = len(all_results)\n",
    "        all_results.append(RawResult(unique_id=unique_id, start_logits=start_logits, end_logits=end_logits))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 5 - Postprocessing "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the predictions (answers to the input questions) in a file "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# postprocessing\n",
    "output_dir = 'predictions'\n",
    "os.makedirs(output_dir, exist_ok=True)\n",
    "output_prediction_file = os.path.join(output_dir, \"predictions.json\")\n",
    "output_nbest_file = os.path.join(output_dir, \"nbest_predictions.json\")\n",
    "write_predictions(eval_examples, extra_data, all_results,\n",
    "                  n_best_size, max_answer_length,\n",
    "                  True, output_prediction_file, output_nbest_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print the results "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"1\": \"Moscone Center in San Francisco\",\n",
      "  \"2\": \"two-thirds majority\",\n",
      "  \"3\": \"2002\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# print results\n",
    "import json\n",
    "with open(output_prediction_file) as json_file:  \n",
    "    test_data = json.load(json_file)\n",
    "    print(json.dumps(test_data, indent=2))\n",
    "    "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:py36]",
   "language": "python",
   "name": "conda-env-py36-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
